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ABSTRACT 



The Critical Decision Method (CDM) is a structured 



interview method for eliciting expert knowl^ge. The method was used 
in a study of computer programmers who were experts at debugging 
complex computer systems. Fifteen programmers who were identified by 
their supervisors as experts were asked to describe an experience in 
which their expertise made a difference • The CDM interviewer used a 
set of questions to elaborate the prc^raramers' responses. Two 
non-expert programmers were interviewed for cciuparison. Each subject 
told the story of a debugging experience four times with varying 
degrees of intervention by the interviewers. Data were validated by a 
panel of eight subjects who reviewed the findings of the interviews. 
The results show that the CDM facilitated the identification and 
description of expert skills (critical knowledge) as we71 as 
resources and heuristics for selection and use of skills and 
resources by the experts^ The stories produced have the potential to 
enhance training materials for other programmers. A 32-item list of 
references is included. (SLD) 
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Abstract 

The Critical Decision Method <CDM) is a structured inteiview method for eliciting expert knowledge. The 
method was employed in a study of computer programmers who are expert at debugging complex computer 
systems. The roJe of the CDM in that study is described and the method is critiqued. 
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Introduction 

Developers of artificial intelligence systems commonly use interviewing techniques to study expert 
performance. Training developers conduct interviews to devdop course content material, tlraugh this is done too 
rarely in general and too rarely involves experts in panicular. In artificial intelligence (AI), the resulting products 
have often proved limited in scope and fragile in character (i.e., they recover poorly, if at all, from trivial errors). 
Though evaluated less-often and tess-publidy, educational products may have similar failings. 

These failings are in part due to the ways that researchOTs use interviews to elicit expert knowledge. 
LaFrance (1989) has proposed three reasons th^t interview-based projects often fail. Interviewers may: 

• Artificially restrict the breadth of the knowledge domain under investigation. 

• Fail to grasp the complexity of experts' goals. 

• Fail to identif)' the wide range of facts and knowledge structures critical to expert performance. 

The interviewing methods employed in AI and educational development p-ojects are of two types; 
unstructured and structured. 

An extreme example of the unstructured approach is the autobiographical interview (Langness and 
Frank, 1981), in which the interviewer simply asks the subject to describe some event or period In detail. Such 
interviews are unstructured in that the interviewer's choice of follow-up questions is largely ad hoc, lacking 
theoretical foundation. These methods can elicit rich descriptions of the social and ph5^icai environments in 
which problems are soh^ed and of the skills and knowledge brought to bear upon theni. 

However, unstructured interviews can incur exorbitant costs for sampling across sites, sampling acnoss 
subjects, and data analysis. Unstructured methods tend to elidt highly compiled (Anderson, 1^) descriptions of 
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skills, knowledge of whose components ntay be critkal to an axKlience of sturfmts (or AI programmers). Fmally, 
interview probes may vary in each interview, so in^actions between qu^ions and subjcxrts or questions and 
subject matter are constant threats to validity. 

A less radical variety of unstructured intovfew is the think-aloud prMKoI (iirici^n and Simon, 1^). It 
is a concurrent, or on-line method in which the subject is asked to bpeak while working on some experimattal 
task, typically in a laboratory* In their review of interview-based research, Ericsson and Stoon found that 
instances of unieliable or invalid daU in studies employing think-aloud protocols could be explained by 
demonstrating that the information requested was r\xA in working mranory dicing the problan^solving episode. 
Thus, think-aloud protocols concerning subjects' concurrent experiraces are assumed to produce data of high 
validity and reliability. However, tl^ laboratory settings common to think-atoud studies artiRdally constrain the 
definition of problems and the resources with which tlwy are solved. (Notably, bboratoiy tasks often r^trict such 
resources as time, job-aids, and human advice). As Rc^ff (1984) has noted, this is unrealistic insofar as 
real-world problems arise in situ and are constrahuid by broader, reaI*world con^ts. 

Structured interviews resolve some of these problems. The "flexible'' interview (Ginsburg, 1987, 1983) is a 
rcfin ment of the Piagetian clinical interview. Used for diagnosing children's mathematics skills, the method 
involves presenting a subject with a problem, asking the child to talk while solving it, and then probing for details 
of method with a limited range of open-ended questions such as: 

* "How did you decide 12 plus 9 make 21 T 



• "What did you do to get that answer?'* 

* "How wc. Jd you teach someone to solve tnat problem?" 

The interviewer then presents problems which test the range and consistency of application of the 
subject's reported method. 

The flexible interview elicits a detailed description of the processes of problem solving. However, it is 
efficient only in domains where correct answers are known and solution methods are cataloged (such as 
elementary mather^atics). 

The method ocplored in this paper is the Critical Decision Method (CDM), a variant on the Critical 
Incident Technique (CITKFlanagan, 1954). The CDM was created to identify strategies used to make rapid 
decisions involving high stakes (Klein, C^derwood, and Cllnton-CiitKcco, 1986> for training and trainii^ needs 
assessment (Klein, Calderwood, and MacGregor, 1988). Subjects are asked to describe an experience in which 
their expertise made a difference. The CDM interviewer employs a small net of probes, or questions (described 
below) to elaborate the story. The prob« help to identify environmental cu» the expert employs in recognitional 
decision-maJdng (Klein, 1989), the actions the expert takes, the option'^ the expert considers, and the expert's 
criteria for choosing among those alternatives. TI^ method has been applied to industrial studies of critical inhant 
care, fire-fighting, military action, and consumer-product purchasing decisions. 

There is some overlap between the CDM and unstructured methods. For instance, l-angfiess and Frank 
(1981) note that autobiographical interviews tend to elidt critical incidents. 

''Exceptional experiences.. are detailed... And, presumably, these are events that strongly affected the 
author's sense of self because, as one critic suggests, the author of an autobiography would have no 
reason to write one unl^ some sort of inner tran^onnation had occurred.*" pg. 89. 

Langness l^rself has used a method similar to the CDM, called event analysis (Langness and Frank, 
1981) to examine incidents that facilitate or hinder "normal" lile style for the mildly retarded, like the CDM, 
event analysis examines incidents in detail, but it elicits stori^ across a lifetime (rather than within recent months 
or years), and involves interviews with all available pankipan^. 
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The remainder of thU paper will describe the CDM as it was applir i to a study of how expert 
prt^rammers debug niassive computer sy terns. Various authors have i»ted the failure of psychological research 
to address the central complexities of computer program d^m^ing. Brooks {1980) observes that code used as 
experimental material is unrealistically brief. Sl.eil (1981) states that the laboratory setting hives subjincs' use of 
tools and sctection of goals, fln fact it often dictates goal selection and promotes contaminating andllaiy goals, 
iUwJj as pleasing the experimenter). Pennington OM® has assarted that this research contains many contradictory 
findings at dubious levels of significance. As Pomington has stated. 

There are no clear, consistent effects for any of the variables {from language features through practices 
such as using flow charts and on-line toolsj. There is more individual difference than experimentally 
manipulated variability." 

The current im>}ect attempted to identify some of the complsxities that have so far confounded 
psychological research into program debugging, and it was designed to gain Insight into how experts grapple 
with these complexities. The goal of the project was to produce materials for a training program that could boost 
the skills of several f*--»usand programmers working for AT&T BeU Laboratories. That goal was met. A course 
entitled "How Experts Debug Complex Systems" is currently being taught to well-enrolled classes throughout Bell 
Laboratories. 



The Expert I>ebugging Project and the CDM 

Sampling 

Representative sampling is a common stumbling block in research involving the time ot scarce and costly 
experts. It was a primary consideration in this project. A survey of staff programming managers was conducted 
across the corporation. The survey indicated wide variability in the information provided at the outset of 
debugging assignments, the tools used, and the size of the software products. Accordingly, the CDM interviews 
were diversified by project site and product size. Interviews were conducted with 15 programmers identified by 
their supCTvisors as "experts." The researchers provided three criteri. by which supervisors assessed expertise: 

• Programmers to whom the supervisors gave complex problems; 

• Programmers to whom they referred other programmers; and 

• Programmers who debug systems more efficiently. 

Two programmers identified by supervisors as non-experts were also interviewed. One expert was 
interviewed on two occasions (concerning the same cases). The remainder were interviewed once. All interviews 
lasted three to four hours, in which time each expert told two or three stories. The research team was blind to the 
expertise level of most of the subjects during the interviews. However, expertise was self-evident. The 
researchers were able to independently assess the expertise of all subjects with high accuracy in an informal poll. 



Interview Methodolog y 

At the beginning of each the CDM interview, the principal Interviewer introduced himself and one to 
three other researchers attending the session. All of the researchers were cognitive psychologists, two had 
programming expraience. They recorded interviews on videotape and took notes on computer. Those with 
programming experience acted as technical translators for the principal interviewer and methodology expert, 
when necessary. The interviewer told each subject the purpose of the study, that a supervisor had recommended 
him or her as a source, and that the content of the interview was confidential. 

The interviewer then asked the expert to recall and describe a debugging incident in which his or her 
skill was particularly important Each story was told four times. 
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In the ftr« raciutioa the expert briefly told the delnigglng $tory, with Ihtle or no Interruption. This 
typically tock to tax minutes. 

In the second iteration, the intmriewer presented the story in the form of a timeline on a whiteboard. 
The Int^view^ ami subject jointly edUed it The timeline ii^ environmental stimuli {e.g., The system 
erroneously reported lack of memory!, the subject's associated actions (e.g., T examined the syst^'s memory 
allocation scheme*!, and the timehame of each action. 

In the thinl instance, the interviewer employed a set of probes to idmtify decision points, options and 
decision crit^. The interviewer elaborated the timeline accordingly. The following is a list of the qi^Uons 
used, though the interviewer varied their selectbn, wording, and sequence of prob»: 

* What did you km>w at this time? 

* What options did you consider? 

* Why did you choose this option? 

^ What experiences or fining were needed? 

* Was anyone else involved? What did thr/ do? 

* What materials or tools did you use? How did you use them? 

In the fourth recitation of the story, experts identified potential pitfalls. This information highlighted 
critical decision points, options, and decision criteria. A variety of questions facilitated this process, among them 
several recommended by LaFrance (1988). The int^ewer elidted detailed descriptions of problem dtarecteristics 
by asking naive questions or by positing the question, **What if 1 had been handed the problem at this stage, 
unskilled as I am in this Betd?" Challenging expert report was occasionally prcKiuctive. Using this strategy, the 
interviewer played devil's advocate or sought exceptions to the generalizations experts made. The interviewer 
also appealed to the experts' experience as mentors with probes such as ''How might a less-experienced person 
have erred here?" 

A fifth type of recitation phase is under development. It is designed to produce videotapes of expert 
stories for cla^room use. Tapes of previous recitations tend to be unsatisfactory for several reasons. Tl^ first 
recitation Is often incomplete, the second iteration is dominated by an interviewer, the third is too lengthy, and 
the fourth conveys only partial and speculative accounts* Furthermore, the latter few recitations tend to employ 
dense terms whose meaning the expert and interviewer have refined throughout the interview. This phenomenon 
is widely recognized in the social psychology literature (Isaacs. and Qark, H.H., 1987; Krauss, R.M., 1987; 
Kraut, R E., Lewis, S H., and Swezey, L., 1982). In the proposed fifth iteration, the intaviewcr and subject discuss 
the language appropriate to the classroom audience and rehearse the explanation of key incidents if necessary. 
The interviewer then uses the timeline as a score with which to "conduct** the expert in a final recitation. 



Analysis 

In the analysis phase, the research team reviewed videotapes, transcripts, timelines, and personal notes to 
identify expeii decisfon^making patterns and the resources experts employ to dc^^ug systems. 

SevRrai decision-making strategies were identified, each of which is illustrated below: 

• Recognitionat decision making - Experts often made decisions using i>erceptual and recognitional cues, 
rather rational, analytical processes. This is typical of exp^ studied in other fields, such as chess 
(DeCroot, 1978). 
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• Backward chaining - Experts' stories indicate that when the expert knows where a system foiled, he or 
she may search execution paths backward from the foUure state to a known '*good*' state, 

• Forward chaiiting - When the point of failure is a mystery or cannot be reprodiKred, exp^s typicaily 
reason forward from the system's starting state to all pKsible states, hoping to find one that produces 
the failure symptoms. 

• Progressive deepening ^ Experts conducting forward or backward searches in well-known code 
sometimes used the intelligent search strategy calted progressive deepening (DeGroot, 197B). Using this 
method, the expert us^ judgement or recognitional capacities to select actions on a breadth^first bases^ 
explores them to an arbitrary depth, and continues rerursively. Debugging experts were dbsersred to use 
this strata in backward chaining patterns^ as well as the f6rward<haining patterns that DeGroot 
observed. 

• Explanation-based decisions - Experts used their experience to sort through the available data and 
symptoms to construct a story of how the ftiilure could be occurring. The story identiBes the flaw at 
some level of specificity, and s^ves «*s a basis for suggesting t^ts and digging procedures. These in 
turn modify, confirm, or reject the story and provide evidence for a better story. Pennington and Hastie 
(1986) have presented the explanation-based dedsior. model to account for the bdiavior of jurors 
assessing guilt. 

• Simulation - Kahneman and Tversky (1982) have described a process of mental simulation to imagine 
the consequences of course of action. Programmers simulated compilation or machine execution of the 
compiled code to test how a suspected error could account for a set of symptoms, as well as to test an 
outcome of a troubleshooting strategy. 

Experts typically used several of these strat^» on a single debugging assignment. In one such instance, 
a programmer used progressive deepening by ranking into three classes (could not, probably would not, or was 
likely to have prompted an error message) the functions that called a oompI;aning ^unction. He started with the 
likeliest functions, identified the ancestors of the error-reporting function, an^? procaxled backwards up through 
the possible execution paths using backward chaining until he d^ected the flav. He tlso OTtployed forward 
analysis to test one hypothesis with on-line cxecution-tracijig tools, having previously employed mental 
simulation to anticipate how this might work. 

Analysis of decision making in the CDM data can also address rarer trategies. For example, 
"elimi-^ation by aspects" (Tversky, 1972) enables the expert to eliminate options that possess some characteristic or 
fail to meet tomt criterion, '^ultiattribute utility analysis" enable the expert to select the option that maximizes 
some group of attributes. 

The analysis phase revealed resouro^ that experts use in their work and some of the key conditions 
under which they use them. In contrast to the resources provided in programming studies, cxperti* in realistic 
environments commonly employ documents (including requirements, spedflcations, manuals, and the system 
code), rtm-time debugging tools, and people (including other experts and clients). The r^earchers focussed on 
heuristics that experts apply when using specific resources (e.g., when hunting for suspected timing bugs, be 
aware of the impact on timing of tracing and breakpointing tools themselves) as well as heuristics that appear to 
operate over all resource: (e.g^ nm easy tests first). 

The rKearchers also considered the character of the bugs experts discussed. Contrary to our 
expectations, the most salient bugs to experts were not the sort of subtte semantic programming errors (e.g., a 
stray semicolon or missing default case) catalogued in the few texts on debugging, but larger-scale design flaws 
(e.g^ omisston of program logic or poorly sequenced actions). The focus of the course was altered accordingly. 

Rnally, the analysis phase served as an occasion to identify persuasive and coherent stories that were 
eventually used in the debugging course. 
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Flana^n (19S4) reoommcnds a highly stmcttired approach to CTT analysis. Its rteps are «> develop 
categories of behavior, validate ihem, sdect behaviors to categoric and categorize them. The researchers of the 
cunent project followed this method insofar as they employed psychologically validated strategies as categories 
and intentionally classified d^nigging behaviors at a high tevel. 



Data Validation 

Data validation was conducted by a panel of eight expe^ ^s dtos^ from among the subjects. During a 
day-long s»sion« the experts reviewed and refined general findings derived from tte debu^ng stories described 
in interviews* 

In preparation for the panel session, tte team circulated among the experts a summary of the flndings, 
and a questionnaire that listed major dd>ugging strategies (eg^ forward and backward chaining) and techniques 
(e.g., breakDointing, use of truth tables) mentioned in the interviews or the literature. Exp^ indicated how often 
they used each method, ho'v critical it was, and how effective. The responses (when weighted and summed) 
produced a ranked list of debugging techniqi^. The panel opened with a round-rc^in discussion of the highest 
rated of th^ methods^ th<^n moved on to discussions of broader skills and strat^es. 

The panel session was productive in several respects, and is recommended as a validation phase in CDM 
studies of expert behavior. 

First and foremost, the panel clarified and approved the team'r findings. For example, expert debugging 
is a less linear process than was implied in most of the individual interviews, in which time pressures or the 
linear character of the timeline may have biased the data. An illu^tfon is the expert habit of frequently pausing 
between detailed tests to reexamir^ a problem at the highest level. In addition, the panel session highlighted the 
importance of emotional cues that r&reived scant attention in the analysis phase. For example, several experts 
described the sense of incipient thrashing that signals them to seek out other experts for advice. 

Gathering experts from many sites into a single rcx>m helped to resolve questions about the use of 
strategies across environments. For example, the interviews suggested that only a few sites had discovered the 
debugging heuristic, **Run easy tests first.** The panel session revealed that no tests are easy in environments with 
highly complex systems or where laboratory time is scarce and laboratory tests involve difficult set-up of 
equipment. The rule is known in such environments, however. 

Second, the panel ses.ion produced new technical tips and anecdotes, as well as videotaped testimonials 
concerning the importance of skills critical to expertise (e.g., interviewing techniques), but not valued by 
non-experts. 

A third benefit of the panel session was that the experts' conversations set a standard for the level of 
technical terminology that the authors co* Id reasonably expect students to grasp. Experts' comments ir the panel 
were generally far less technical - and thus suited to a broader audience - than their language in on-site 
interviews concerning specific system failun^s. 

Fmally, the experts endorsed the course developers' strategy of teaching non^expert debuggers high-level 
strategics, rather than langtiage- or system-specific tricks. 

Thus, expert review of CDM findings can refine and validate findings, produce new data, set standards 
for the language to be used in training materials, « nd validate strategies for the use of data. It should be noted 
that reviews should, ideally, be conducted by experts who have not participated in interviews. In this case, 
however, the interviews had so involved the expert subject that every one agreed io attend the review at the 
expense of his own department. Given a limited budget and the perceived objectivity of the experts involved, the 
compromise seemed acceptable. 



Benefits of tte CDM 



Experts' support of the research findings suggests that the Critical Dedsion Method elicited valid data. 
There are several reasons one would expect the CDM to do so. 

Episodic niCTioiy, from which expert storira arise, has proven to be hJghly reliable and valid (Tulving, 
1972; Bower, Black, and Turner, I9:^), Evidei^ that immoiy is hieraitAically structured (Klntech ei al., 1975; 
Thomdyke, 1977; Chase and Ericsson, 1981) suggests that critical incidents within a story should be recalled 
accurately. Ericsson and Simon (1980) caution that subjects may accurately recall t)» details of a single 
instance of an often-repeated n&k. However, the CDM does not elicit routine inddoits* 

Ericsson and Simon (1^) have reported that y&A>u\ protocols should echibit high validity when 
intermediate cognitive processing is excluded. Tte CDM largely controls two of the three intmnediate processes 
dted by Ericsson and Simon; it handles third modmtdy well 

• Editing - Editorial fitting is minimized because each subject is asked to report everything he or she 
recalls concerning a single event. The timeline p^vides an addititmal sigtud of and a check against 
editing, as discussed below. Edgerton and Langn^ 09^) have introduced another dieck by eliciting 
descriptions of each event from several participants, Howev&*, the salience of the incident and its details 
may vary by subject. Thus, the reliability of reports may be poor across experts. It is interesting to note 
that sub^s in the debugging study did not seem to edit their stories at the highest level. Specifical' % 
they did not appear biased towards selecting success stories. The interviews elicited a number of 
incidents in which the expert ^iled to learn the root cairo of the problem, and resolved it by building 
saf^ards against its symptoms. 

• Infcrendng - Ericsson and Simon (1980) warn that "Interpretive probing, unlike the critical ircidcnt 
technique, cannot be relied on to produce data stemming directly from the subjects' actual sequences of 
thought processes.** (pg. 221). For example, asking a subject why she took some action produces l»s 
reliable dat<t than asking what actions she took. CDM subjects are asked to report what they obserx^ or 
concluded during a single incident, not their assessmenU of events nor generalizations across incidents. 
Thus, inferential and generative errors shoukJ be minimized, subjects in the dd)ugging study used 
generalization {ag., T do this whenever I see symptoms like that.") principally as a last resort, to explain 
decision-making at the recognitional level 

• Intermediate recoding — Information that is not stored verbally may not be verbalized accurately, 
according to Ericsson and Simon (1980). Subjects of the CDM are invited not only to speak, but to draw 
pictures to describe parts of the task that are b^t represented visually. By interviewing subjects on the 
job site, the Cur^ interviewer allows the subject to »ise tools and other media to convey concepts that are 
difficult to verbalize or draw. Videotaping is especially important for analysis of such material because it 
captur<» visual, aural, and concurrent events more accurately than can a researcher's written i^ies. 

Does forgetting bias CDM reports? The effect seems to be minimal. Crandall (1989) reports that the 
CDM proved as evocative as think-aloud protocols in terms of the number of plans and ^ions, and the 
specificity of goals elicited. Similarly, Flanagan (1954) asserts, on the basis of numerous OT studies, that reports 
of salient historical events are reasonably complete. Ericsson and Simon (1980) declined to address the accuracy 
of retrieval from long-term memory, though they warn that the accuracy of retrospective reports may suffer from 
forgetting or compilation that obscures detailn. While forgetting almor* certainly affects CDM reports to sc.ne 
extent, the use of the timeline signals n*emory or reporting problsns to the interviewer. Unoccupied slots on the 
timeline point the interviewer toward unreported events. Overlapping events in a time slot may indicate 
inaccurate recall. Both conditions mark the need for intensive questioning. Finally, using the timeline to 
represent the problem in it» totality seems to help subject recall details. 

The CDM has other benefits, in addition to the validity of the ref^rts it elicits. 
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Like unstructured interviews, the CDM allows subjects to d^ine the domain and the nature of the 
expertise they ei^rdse over it. For example, the results of this debugging study challenged assumptions common 
in psychological experiments in this field. 

* Each of thirty incidents involved social interaction. In laboratory experiments debuggers almost always 
work alone* 

* Most incidents involved the use c system tools or dooimentation. Few experimental situations do so. 
(Couki, 1975, is an noeption). 

* The experts in the current study had as much as twenty years of professional exp^ence (not counting 
graduate education). Most studio employ seniors or graduate students f o^ computer science 
depaitm^tts as experts. 

* The code discu^ed by drugging experts ranged in size from ten lines (previously selected by 
colleagues from within a massive system) to hundreds of thousands of linra. Laboratory experiments 
unifomily have empk>yed code that is SX) lines or shorter (rarely more than 250 lines), and written 
languages that promote a less dense programming style than C, the one discussed by many of the experts 
int^viewed in the current project. 

* The debugging experts used information concerning the run-time and testing hardware as well as the 
reliability of code authors and users. This information has not been provided subjects in laboratory 
experiments, to our knowledge. 

As expected of a clinical interview, the CDM facilitates decomposition of expert stories into skills, such as 
how experts rank search paths during backwards analysis. 

The CDM may elicit fairly reliable reports. In the irtstance in which a subject repeated two stories at a 
second interview session, the reports were consistent. 

The CDM elicits elo(j jent reports, even from professionals whose principle claim is technological 
knowledge, not interpersonal communication skill. The story-telling method seems to generate enthusiasm, as 
indicated by the general tenor of the interviews and, more concretely, by the willingn^s of experts to attend the 
distant, final panel session at their own expense. 

Fmally, the CDM is economical. Interviews take as little as two hours each. Videotapes cf interviews 
serve both as data and as course materials. 



Disadvantages 
The CDM does have several shortcomings. 

Ranagan (1954) points out the difficulty o^ developing a classification scheme for observed behaviors. 
He advises only that the scheme reflect the application of the findings. Thus, research intended to facilitate 
personnel selection might categorize behaviors in terms of personality traits. In the current project, targeted 
towards training, behaviors were categorized by strategy, resources used, and heuristics regarding resource 
selection and use. 

Cn^s-site differences are difficult to analyze, though judicious sampling can ensure that they are at least 
evident. As mentioned previously, the rule **run cheap tests first** was not employed in some environments. Only 
panel session interactions explained why. 
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Recall of details may noi be accurate. Low-level discrepand^ (e«g., the number of programming 
variables) were found between subject recall and program code In the one Instance which was investigated in 
detail. Those disoepandes did not, however, in\4lidate significant aspects of the expert's recall. 

The OSM is difficult to use with people who Imve a gen^l and theoretical knowledge of a domain, but 
little direct experience. These people simply don't have stories to tell This was not the case in the dd>ugging 
study. 'Tlands-o^ experts can contribute rules and declarath^ knowledge that^ if valid^ is best elidted using 
other methods. 

The CDM elicits the critical fringe of challenges - the ones that challenge experts. It does not elicit 
information about common problems, except insofar as they are components of the rarer incidents. The rpecsa.ch 
sponsor mu^ determine at the outset whether there is a greater marginal return for fodlitating the solution of 
common problems or the uncommon ones that the CDM elicits. 

Finally the CDM does not identify the optimal intervention. Thus, it can provide materials for training, 
but no direction concerning their comparative value, nor concerning the organization or medium that b^t 
represents them. 



Summary 

The Critical Decision Method is an economical and productive method of structured interview. It is 
particularly appropriate for exploratory investigations of expertise in complex domains. There is evidence 
(Crandall, 1989) that CDM interviews produce valid and reliable reports of problem^lving incidents, and there is 
a solid theoretical foundation for such a claim (Ericsson and Simon, 1980). This stiidy did not test these claims, 
nowever. The CDM facilitates decompilation of expert skills. It aids identification of critical knowledge and 
resources, as well as heuristics for their selection and use. In addition, the method produces stories that can enrich 
training materials by virtue of their content, their episodic format, and the respect that students attribute to the 
expert sources. 

We hope that other researchers will explore the method. Quantiutive tests of the CDM would be 
particularly useful However, application of the methcnl to a broader range of domains will almost certainly 
reward both researchers in those domains and those who wish to refine knowledge elicitation techniques such as 
the Critical Decision Method. 
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